Data visualization

Author

Oleksandr Koval

Mission

This document created to make easier understanding data that we are working.

Introduction

In this document we will observe some tables:

  • ds_year (describes the number of titles and the number of copies for each year, since 2018)

  • ds_genre (describes the number of titles and the number of copies for each year, by genre)

  • ds_georaphy (describes the number of titles and the number of copies for each year, by region)

  • ds_language (describes the number of titles and the number of copies for each year, by language)

  • ds_pubtype (describes the number of titles and the number of copies for each year, by publication type)

  • ds_ukr_rus (describes the number of titles and the number of copies, and the percentage of Ukrainian and Russian books for each year)

P.s: All tables contain information for 2005–2024, except for ds_language

Preparing

Load the libraries used in this report.

load libraries
library(dplyr)
library(ggplot2)
library(tidyr)
library(plotly)

Data import

We will be working with data that describes two main measures: the number of titles (title_count) and the number of copies (copy_count). Several specific tables provide these measures under different conditions.

Load data
getwd()
ds_genre <- readRDS("../../data-private/derived/manipulation/ds_genre.rds") # load ds_genre

ds_year <- readRDS("../../data-private/derived/manipulation/ds_year.rds") # load ds_year

ds_language  <- readRDS("../../data-private/derived/manipulation/ds_language.rds") # load ds_language

ds_geography  <- readRDS("../../data-private/derived/manipulation/ds_geography.rds") # load ds_geography

ds_pubtype  <- readRDS("../../data-private/derived/manipulation/ds_pubtype.rds") # load ds_pubtype

ds_ukr_rus  <- readRDS("../../data-private/derived/manipulation/ds_ukr_rus.rds") # load ds_ukr_rus

What we going to do?

We will create a short visualization for each table to make the data easier to understand. You will see several graphs that describe the main features of the data.

Lets start !

DS_YEAR

This table has 40 rows and 3 columns. In the measure column, there are only two possible values: title_count (the number of titles) and copy_count (the number of copies). The value column contains the numeric data for both measures.

The first graph shows the number of copies for each year.

Making graph
years_all <- seq(min(ds_year$yr), max(ds_year$yr))

g_year_copy <- ds_year %>% 
  filter(measure == "copy_count") %>% 
  ggplot(aes(x = yr, y = value)) +
  geom_point() +
  geom_line() +
  scale_x_continuous(breaks = years_all) +             
  scale_y_continuous(breaks = seq(0, 80000, by = 10000), limits = c(0, 80000)) +  
  labs(
    title = "Number of Copies by Year"
    , x = "The year"
    , y = "Number of copies (ths.)"
  )+
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank()
    ,panel.border = element_rect(color = "black", fill = NA, size = 1)
    ) 

g_year_copy_plotly <- ggplotly(g_year_copy)

g_year_copy_plotly

The second graph observe the number of titles for each year.

Making graph
g_year_title <- ds_year %>% 
  filter(
    measure == "title_count"
  ) %>% 
  ggplot(aes(x = yr, y = value)) +
  geom_point() +
  geom_line() +
  scale_x_continuous(breaks = years_all) +
  scale_y_continuous(breaks = seq(0, 27500, by = 5000), limits = c(7500, 27500)) +
  theme_minimal() +
  labs(
    title = "Number of Titles by Year"
    , x = "The year"
    , y = "Number of titles"
  ) +
  theme(panel.grid.minor = element_blank()
        ,panel.border = element_rect(color = "black", fill = NA, size = 1)) 


g_year_title_plotly <- ggplotly(g_year_title)

g_year_title_plotly
Making graph
rm(g_year_title, g_year_copy, years_all)

What can we observe?

  • The most productive year for new titles and the year with the most copies

  • The overall production trend

DS_GENRE

This table has 40 rows and 15 columns. In the measure column, there are only two possible values: title_count (the number of titles) and copy_count (the number of copies). Starting from the third column, each column represents a separate genre.

In the first graph, we see the number of copies for each genre, by year.

Making graph
ds_genre_copy <- 
  ds_genre %>% 
  filter(
    measure == "copy_count"
  ) %>% 
  group_by(yr)  %>%
  pivot_longer(
    cols = -c(yr, measure),  # adding genre column
    names_to = "genre",
    values_to = "value"
  )


g_genre_copy<- ds_genre_copy %>%
  ggplot(aes(x = genre, y = value, fill = genre)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 30000, by = 7500), limits = c(0, 37000)) +
  labs(
    title = "Number of Copies by Genre and Year",
    x = "Genre",
    y = "Number of Copies (ths.)",
    fill = "Genre"
  ) +
  theme_get() +
  theme(
    axis.text.x = element_blank(),      
    axis.title.x = element_blank() 
    ,legend.position = "bottom"
    , panel.spacing.y = unit(5, "lines")
  )

g_genre_copy_plotly <- ggplotly(g_genre_copy) %>%
  layout(
    legend = list(
      orientation = "h",           
      x = 0.5,                     
      y = -0.1,                   
      xanchor = "center"
    )
  )

g_genre_copy_plotly

In a second graph, we can see the number of titles for each genre, by year.

Making graph
ds_genre_title <- 
  ds_genre %>% 
  filter(
    measure == "title_count"
  ) %>% 
  group_by(yr) %>%
  pivot_longer(
    cols = -c(yr, measure),  # adding genre column
    names_to = "genre",
    values_to = "value"
  )

g_genre_title <- ds_genre_title %>%
  ggplot(aes(x = genre, y = value, fill = genre)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  labs(
    title = "Number of Titles by Genre and Year",
    x = "Genre",
    y = "Number of Titles",
    fill = "Genre"
  ) +
  theme_get() +
  theme(
    axis.text.x = element_blank(),      
    axis.title.x = element_blank()  
    ,legend.position = "bottom"
    ,  panel.spacing.y = unit(5, "lines"),

  )

g_genre_title_plotly <- ggplotly(g_genre_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_genre_title_plotly
Making graph
rm(g_genre_copy, ds_genre_copy, ds_genre_title, g_genre_title)

DS_GEOGRAPHY

This table has 40 rows and 28 columns. In the measure column, there are only two possible values: title_count (the number of titles) and copy_count (the number of copies). Starting from the third column, each column represents a separate geographical region.

do some preparing
region_groups <- list(
  "Західна Україна" = c("Львівська", "Івано-Франківська", "Закарпатська", "Тернопільська", "Чернівецька", "Волинська", "Рівненська"),
  "Центральна Україна" = c("Київська", "Житомирська", "Черкаська", "Кіровоградська", "Полтавська", "Вінницька"),
  "Південна Україна" = c("Одеська", "Миколаївська", "Херсонська", "Запорізька"),
  "Східна Україна" = c("Харківська", "Донецька", "Луганська", "Дніпропетровська"),
  "Північна Україна" = c("Чернігівська", "Сумська")
  , "Крим" = "Автономна Республіка Крим"
)

region_group_df <- tibble::tibble(
  region_name = unlist(region_groups),
  group = rep(names(region_groups), times = sapply(region_groups, length))
)

g_geography <- ds_geography %>%
  pivot_longer(
    cols = -c(yr, measure),    
    names_to = "region_name",
    values_to = "value"
  )  %>%
  left_join(region_group_df, by = "region_name") %>%
  select(yr, measure, region_name, group, everything())

Inspecting the number of copies

Central Ukraine (Copy count)

This graph describes the number of copies for each region in Central Ukraine

Making graph
g_geography_central_copies <- g_geography %>%
  filter(measure == "copy_count", group == "Центральна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 750, by = 250)) +      
  coord_cartesian(ylim = c(0, 750)) +                       
  labs(
    title = "Number of Copies by Region (Central Ukraine) and Year",
    x = "Region",
    y = "Number of Copies (ths.)",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank()
     ,  panel.spacing.y = unit(6, "lines")
     ,  panel.spacing.x = unit(2, "lines")

  )

g_geography_central_copies_plotly <- ggplotly(g_geography_central_copies) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_central_copies_plotly
Making graph
rm(g_geography_central_copies)

North Ukraine (Copy count)

This graph describes the number of copies for each region in North Ukraine

Making graph
g_geography_pivnichna_copies <- g_geography %>%
  filter(measure == "copy_count", group == "Північна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 500, by = 250)) +      
  coord_cartesian(ylim = c(0, 500)) +                       
  labs(
    title = "Number of Copies by Region (North Ukraine) and Year",
    x = "Region",
    y = "Number of Copies (ths.)",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_pivnichna_copies_plotly <- ggplotly(g_geography_pivnichna_copies) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_pivnichna_copies_plotly
Making graph
rm(g_geography_pivnichna_copies)

South Ukraine (Copy count)

This graph describes the number of copies for each region in South Ukraine

Making graph
g_geography_pivdena_copies <- g_geography %>%
  filter(measure == "copy_count", group == "Південна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 500, by = 250)) +      
  coord_cartesian(ylim = c(0, 500)) +                       
  labs(
    title = "Number of Copies by Region (South Ukraine) and Year",
    x = "Region",
    y = "Number of Copies (ths.)",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_pivdena_copies_plotly <- ggplotly(g_geography_pivdena_copies) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_pivdena_copies_plotly
Making graph
rm(g_geography_pivdena_copies)

West Ukraine (Copy count)

This graph describes the number of copies for each region in West Ukraine

Making graph
g_geography_zahidna_copies <- g_geography %>%
  filter(measure == "copy_count", group == "Західна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 1250, by = 500)) +      
  coord_cartesian(ylim = c(0, 1250)) +                       
  labs(
    title = "Number of Copies by Region (West Ukraine) and Year",
    x = "Region",
    y = "Number of Copies (ths.)",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_zahidna_copies_plotly <- ggplotly(g_geography_zahidna_copies) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_zahidna_copies_plotly
Making graph
rm(g_geography_zahidna_copies)

East Ukraine (Copy count)

This graph describes the number of copies for each region in East Ukraine

Making graph
g_geography_shidna_copies <- g_geography %>%
  filter(measure == "copy_count", group == "Східна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 3500, by = 1000)) +      
  coord_cartesian(ylim = c(0, 3500)) +                       
  labs(
    title = "Number of Copies by Region (East Ukraine) and Year",
    x = "Region",
    y = "Number of Copies (ths.)",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_shidna_copies_plotly <- ggplotly(g_geography_shidna_copies) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_shidna_copies_plotly
Making graph
rm(g_geography_shidna_copies)

Crimea (Copy count)

This graph describes the number of copies for Crimea

Making graph
g_geography_krym_copies <- 
  g_geography %>% 
  filter(
    measure == "copy_count",
    group == "Крим"
  ) %>% 
  ggplot(aes(x = yr, y = value, group = 1)) +   # add group=1 for geom_line
  geom_point() +
  geom_line() +
  labs(
    title = "Number of Copies by Region (Krym) and Year",
    x = "Year",
    y = "Number of Copies (ths.)"
  ) +
  scale_y_continuous(breaks = seq(0, 1500, by = 250), limits = c(0, 1500)) +  
  scale_x_continuous(breaks = 2005:2024) +
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank(),
    legend.position = "none"  # No legend needed, since only one line/region
  )

g_geography_krym_copies_plotly <- ggplotly(g_geography_krym_copies)

g_geography_krym_copies_plotly
Making graph
rm(g_geography_krym_copies)

Inspectin the number of titles

Central Ukraine (Titles count)

This graph describes the number of titles for each region in Central Ukraine

Making graph
g_geography_central_title <- g_geography %>%
  filter(measure == "title_count", group == "Центральна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 750, by = 250)) +      
  coord_cartesian(ylim = c(0, 750)) +                       
  labs(
    title = "Number of Titles by Region (Central Ukraine) and Year",
    x = "Region",
    y = "Number of Titles",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_central_title_plotly <- ggplotly(g_geography_central_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_central_title_plotly
Making graph
rm(g_geography_central_title)

North Ukraine (Titles count)

This graph describes the number of titles for each region in North Ukraine

Making graph
g_geography_pivnichna_title <- g_geography %>%
  filter(measure == "title_count", group == "Північна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 750, by = 250)) +      
  coord_cartesian(ylim = c(0, 750)) +                       
  labs(
    title = "Number of Titles by Region (North Ukraine) and Year",
    x = "Region",
    y = "Number of Titles",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_pivnichna_title_plotly <- ggplotly(g_geography_pivnichna_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_pivnichna_title_plotly
Making graph
rm(g_geography_pivnichna_title)

South Ukraine (Titles count)

This graph describes the number of titles for each region in South Ukraine

Making graph
g_geography_pivdena_title <- g_geography %>%
  filter(measure == "title_count", group == "Південна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 750, by = 250)) +      
  coord_cartesian(ylim = c(0, 750)) +                       
  labs(
    title = "Number of Titles by Region (South Ukraine) and Year",
    x = "Region",
    y = "Number of Titles",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_pivdena_title_plotly <- ggplotly(g_geography_pivdena_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_pivdena_title_plotly
Making graph
rm(g_geography_pivdena_title)

West Ukraine (Titles count)

This graph describes the number of titles for each region in West Ukraine

Making graph
g_geography_zahidna_title <- g_geography %>%
  filter(measure == "title_count", group == "Західна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 1250, by = 500)) +      
  coord_cartesian(ylim = c(0, 1250)) +                       
  labs(
    title = "Number of Titles by Region (West Ukraine) and Year",
    x = "Region",
    y = "Number of Titles",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_zahidna_title_plotly <- ggplotly(g_geography_zahidna_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_zahidna_title_plotly
Making graph
rm(g_geography_zahidna_title)

East Ukraine (Titles count)

This graph describes the number of titles for each region in East Ukraine

Making graph
g_geography_shidna_title <- g_geography %>%
  filter(measure == "title_count", group == "Східна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 4000, by = 1000)) +      
  coord_cartesian(ylim = c(0, 4000)) +                       
  labs(
    title = "Number of Titles by Region (East Ukraine) and Year",
    x = "Region",
    y = "Number of Titles",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_shidna_title_plotly <- ggplotly(g_geography_shidna_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_shidna_title_plotly
Making graph
rm(g_geography_shidna_title)

Crimea (Titles count)

This graph describes the number of titles for Crimea

Making graph
g_geography_krym_title <- 
  g_geography %>% 
  filter(
    measure == "title_count",
    group == "Крим"
  ) %>% 
  ggplot(aes(x = yr, y = value, group = 1)) +  # group=1 for line plot
  geom_point() +
  geom_line() +
  labs(
    title = "Number of Titles by Region (Krym) and Year",
    x = "Year",
    y = "Number of Titles"
  ) +
  scale_y_continuous(breaks = seq(0, 1000, by = 250), limits = c(0, 1000)) +  
  scale_x_continuous(breaks = 2005:2024) +
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank(),
    legend.position = "none"  # no legend needed
  )

g_geography_krym_title_plotly <- ggplotly(g_geography_krym_title)

g_geography_krym_title_plotly
Making graph
rm(g_geography_krym_title)

DS_LANGUAGE

This table has 14 rows and 39 columns. In the measure column, there are only two possible values: title_count (the number of titles) and copy_count (the number of copies). Starting from the third column, each column represents a new language. Data is available only for the years 2018–2024.

To analyze this table, we will examine each year separately. Each graph displays only those languages that have at least 10 different titles and 10,000 copies.

do some preparing
g_language <- ds_language %>%
  pivot_longer(
    cols = -c(yr, measure),     
    names_to = "language",
    values_to = "value"
  )

2018 (Language)

Making graph
g_language_2018 <- 
  g_language %>% 
  filter(
    yr == 2018,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька") 
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 40000, by = 5000), limits = c(0, 40000)) +  
  labs(
    title = "Number of Copies and Titles (2018)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom"
    , axis.text.x = element_text(size = 8) 
  )

g_language_2018_plotly <- ggplotly(g_language_2018) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2018_plotly
Making graph
rm(g_language_2018)

2019 (Language)

Making graph
g_language_2019 <- 
  g_language %>% 
  filter(
    yr == 2019,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    # Rename languages as needed
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька")
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 55000, by = 10000), limits = c(0, 55000)) +  
  labs(
    title = "Number of Copies and Titles (2019)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(size = 8)
  )

g_language_2019_plotly <- ggplotly(g_language_2019) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2019_plotly
Making graph
rm(g_language_2019)

2020 (Language)

Making graph
g_language_2020 <- 
  g_language %>% 
  filter(
    yr == 2020,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    # Rename language as needed:
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька")
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 40000, by = 5000), limits = c(0, 40000)) +  
  labs(
    title = "Number of Copies and Titles (2020)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(size = 8)
  )

g_language_2020_plotly <- ggplotly(g_language_2020) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2020_plotly
Making graph
rm(g_language_2020)

2021 (Language)

Making graph
g_language_2021 <- 
  g_language %>% 
  filter(
    yr == 2021,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    # Rename language as needed:
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька")
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 40000, by = 5000), limits = c(0, 40000)) +  
  labs(
    title = "Number of Copies and Titles (2021)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(size = 8)
  )

g_language_2021_plotly <- ggplotly(g_language_2021) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2021_plotly
Making graph
rm(g_language_2021)

2022 (Language)

Making graph
g_language_2022 <- 
  g_language %>% 
  filter(
    yr == 2022,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    # Optional: rename any language here
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька")
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 15000, by = 5000), limits = c(0, 15000)) +  
  labs(
    title = "Number of Copies and Titles (2022)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(size = 8)
  )

g_language_2022_plotly <- ggplotly(g_language_2022) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2022_plotly
Making graph
rm(g_language_2022)

2023 (Language)

Making graph
g_language_2023 <- 
  g_language %>% 
  filter(
    yr == 2023,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    # Optional: rename any language as needed
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька")
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 25000, by = 5000), limits = c(0, 25000)) +  
  labs(
    title = "Number of Copies and Titles (2023)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(size = 8)
  )

g_language_2023_plotly <- ggplotly(g_language_2023) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2023_plotly
Making graph
rm(g_language_2023)

2024 (Language)

Making graph
g_language_2024 <- 
  g_language %>% 
  filter(
    yr == 2024,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    # Optional: rename any language as needed
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька")
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 35000, by = 5000), limits = c(0, 35000)) +  
  labs(
    title = "Number of Copies and Titles (2024)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(size = 8)
  )

g_language_2024_plotly <- ggplotly(g_language_2024) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2024_plotly
Making graph
rm(g_language_2024)

DS_PUBTYPE

This table has 40 rows and 16 columns. In the measure column, there are only two possible values: title_count (the number of titles) and copy_count (the number of copies). Starting from the third column, each column represents a different publication type.

The first graph shows the number of copies for each publication type, by year.

Making graph
ds_pubtype_l <- ds_pubtype %>%
  pivot_longer(
    cols = -c(yr, measure),  
    names_to = "pubtype",      
    values_to = "value"       
  )


ds_pubtype_copy <- 
  ds_pubtype_l %>%
  filter(measure == "copy_count") %>%
  group_by(yr)

g_pubtype_copy <- ds_pubtype_copy %>%
  ggplot(aes(x = pubtype, y = value, fill = pubtype)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 30000, by = 7500), limits = c(0, 37000)) +
  coord_cartesian(ylim = c(0, 37000)) +
  labs(
    title = "Number of Copies by Publication Type and Year",
    x = "Publication Type",
    y = "Number of Copies (ths.)",
    fill = "Publication Type"
  ) +
  theme_get() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank()
        ,legend.position = "bottom"
    , panel.spacing.y = unit(5, "lines")
  )

g_pubtype_copy_plotly <- ggplotly(g_pubtype_copy) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_pubtype_copy_plotly
Making graph
rm(ds_pubtype_l, ds_pubtype_copy, g_pubtype_copy)

Second graph describes the number of titles of each publication type, by year.

Making graph
ds_pubtype_l <- ds_pubtype %>%
  pivot_longer(
    cols = -c(yr, measure),  
    names_to = "pubtype",      
    values_to = "value"       
  )

ds_pubtype_title <- 
  ds_pubtype_l %>%
  filter(measure == "title_count") %>%
  group_by(yr)

g_pubtype_title <- ds_pubtype_title %>%
  ggplot(aes(x = pubtype, y = value, fill = pubtype)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 9000, by = 2500), limits = c(0, 9000)) +
  coord_cartesian(ylim = c(0, 9000)) +
  labs(
    title = "Number of Titles by Publication Type and Year",
    x = "Publication Type",
    y = "Number of Titles",
    fill = "Publication Type"
  ) +
  theme_get() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    legend.position = "bottom",
    panel.spacing.y = unit(5, "lines")
  )

g_pubtype_title_plotly <- ggplotly(g_pubtype_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_pubtype_title_plotly
Making graph
rm(ds_pubtype_title, g_pubtype_title)

DS_UKR_RUS

This table has 40 rows and 16 columns. In the measure column, there are only two possible values: title_count (the number of titles) and copy_count (the number of copies). The third column shows the percentage of Ukrainian books among Ukrainian and Russian editions. The fourth column shows the percentage of Russian books among Ukrainian and Russian editions.

The first graph shows the number of copies of Ukrainian and Russian books for each year from 2005 to 2024.

Making graph
g_ukrrus_copies <- 
  ds_ukr_rus %>%
  filter(measure == "copy_count") %>%
  select(yr, ukr, rus) %>%
  pivot_longer(
    cols = c(ukr, rus),
    names_to = "language",
    values_to = "value"
  ) %>% 
  ggplot(aes(x = factor(yr), y = value, fill = language)) +
  geom_col(position = "dodge") +
  scale_x_discrete(expand = expansion(mult = c(0))) +
  labs(
    title = "Number of Copies: Ukrainian vs Russian by Year",
    x = "Year",
    y = "Number of Copies (ths.)",
    fill = "Language"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom"
  )

g_ukrrus_copies_plotly <- ggplotly(g_ukrrus_copies) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_ukrrus_copies_plotly
Making graph
rm(g_ukrrus_copies)

Second graph describes the number of titles of Ukrainian and russian titles for each year since 2005 to 2024.

Making graph
g_ukrrus_title <- 
  ds_ukr_rus %>%
  filter(measure == "title_count") %>%
  select(yr, ukr, rus) %>%
  pivot_longer(
    cols = c(ukr, rus),
    names_to = "language",
    values_to = "value"
  ) %>% 
  ggplot(aes(x = factor(yr), y = value, fill = language)) +
  geom_col(position = "dodge") +
  scale_x_discrete(expand = expansion(mult = c(0))) +
  scale_y_continuous(breaks = seq(0, 20000, by = 5000), limits = c(0, 20000)) +
  labs(
    title = "Number of Titles: Ukrainian vs Russian by Year",
    x = "Year",
    y = "Number of Titles",
    fill = "Language"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom"
  )

g_ukrrus_title_plotly <- ggplotly(g_ukrrus_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_ukrrus_title_plotly
Making graph
rm(g_ukrrus_title)

Overral

This document was created to help you gain a deeper understanding of the data. I hope you found it useful! 😉